We are asked to analyze movement and tracking data of GAStech employees to identity anomolies and suspicious behaviour.
GAStech is a company that is located in a country island of Kronos and it has come to their attention that some of the employees had mysteriously went missing. Vehicles tracking data that was secretly installed in the company’s cars and Kronos-Kares benefit card information are delivered to authorities for investigation.
packages = c('tidyverse', 'lubridate', 'dplyr', 'raster', 'clock', 'sf', 'tmap', 'plotly', 'ggplot2', 'mapview', 'rgdal',
'rgeos', 'tidyr', 'timevis')
for (p in packages) {
if (!require(p, character.only = T)) {
install.packages(p, repos = "http://cran.us.r-project.org")
}
library(p, character.only = T)
}
carAssignment <- read_csv("mc2/car-assignments.csv")
carAssignment
# A tibble: 44 x 5
LastName FirstName CarID CurrentEmploymentT… CurrentEmploymentT…
<chr> <chr> <dbl> <chr> <chr>
1 Calixto Nils 1 Information Techno… IT Helpdesk
2 Azada Lars 2 Engineering Engineer
3 Balas Felix 3 Engineering Engineer
4 Barranco Ingrid 4 Executive SVP/CFO
5 Baza Isak 5 Information Techno… IT Technician
6 Bergen Linnea 6 Information Techno… IT Group Manager
7 Orilla Elsa 7 Engineering Drill Technician
8 Alcazar Lucas 8 Information Techno… IT Technician
9 Cazar Gustav 9 Engineering Drill Technician
10 Campo-Corr… Ada 10 Executive SVP/CIO
# … with 34 more rows
ccData <- read_csv("MC2/cc_data.csv")
ccData$timestamp = date_time_parse(ccData$timestamp, zone = "", format = "%m/%d/%Y %H:%M")
ccData <- ccData %>%
mutate(date = as.Date(timestamp), time = strftime(timestamp, "%H:%M"), hr = strftime(timestamp, "%H"))
loyaltyData <- read_csv("MC2/loyalty_data.csv") %>%
mutate(date = as.Date(mdy(timestamp)))
ccLoyalty <- left_join(ccData, loyaltyData, by = c("date", "location", "price")) %>%
dplyr::select(timestamp.x, date, time, location, price, last4ccnum, loyaltynum, hr) %>%
rename(timestamp = timestamp.x) %>%
group_by(last4ccnum)
ccLoyalty$weekday = wday(ccLoyalty$date, label = TRUE, abbr = TRUE)
ccLoyalty$last4ccnum = as.character(ccLoyalty$last4ccnum)
ccLoyalty
# A tibble: 1,496 x 9
# Groups: last4ccnum [55]
timestamp date time location price last4ccnum
<dttm> <date> <chr> <chr> <dbl> <chr>
1 2014-01-06 07:28:00 2014-01-05 07:28 Brew've Been… 11.3 4795
2 2014-01-06 07:34:00 2014-01-05 07:34 Hallowed Gro… 52.2 7108
3 2014-01-06 07:35:00 2014-01-05 07:35 Brew've Been… 8.33 6816
4 2014-01-06 07:36:00 2014-01-05 07:36 Hallowed Gro… 16.7 9617
5 2014-01-06 07:37:00 2014-01-05 07:37 Brew've Been… 4.24 7384
6 2014-01-06 07:38:00 2014-01-05 07:38 Brew've Been… 4.17 5368
7 2014-01-06 07:42:00 2014-01-05 07:42 Coffee Camel… 28.7 7253
8 2014-01-06 07:43:00 2014-01-05 07:43 Brew've Been… 9.6 4948
9 2014-01-06 07:43:00 2014-01-05 07:43 Brew've Been… 16.9 9683
10 2014-01-06 07:47:00 2014-01-05 07:47 Hallowed Gro… 16.5 8129
# … with 1,486 more rows, and 3 more variables: loyaltynum <chr>,
# hr <chr>, weekday <ord>
[1] "é"
\xc3\xa9
# A tibble: 1,496 x 9
# Groups: last4ccnum [55]
timestamp date time location price last4ccnum
<dttm> <date> <chr> <chr> <dbl> <chr>
1 2014-01-06 07:28:00 2014-01-05 07:28 Brew've Been… 11.3 4795
2 2014-01-06 07:34:00 2014-01-05 07:34 Hallowed Gro… 52.2 7108
3 2014-01-06 07:35:00 2014-01-05 07:35 Brew've Been… 8.33 6816
4 2014-01-06 07:36:00 2014-01-05 07:36 Hallowed Gro… 16.7 9617
5 2014-01-06 07:37:00 2014-01-05 07:37 Brew've Been… 4.24 7384
6 2014-01-06 07:38:00 2014-01-05 07:38 Brew've Been… 4.17 5368
7 2014-01-06 07:42:00 2014-01-05 07:42 Coffee Camel… 28.7 7253
8 2014-01-06 07:43:00 2014-01-05 07:43 Brew've Been… 9.6 4948
9 2014-01-06 07:43:00 2014-01-05 07:43 Brew've Been… 16.9 9683
10 2014-01-06 07:47:00 2014-01-05 07:47 Hallowed Gro… 16.5 8129
# … with 1,486 more rows, and 3 more variables: loyaltynum <chr>,
# hr <chr>, weekday <ord>
# A tibble: 64 x 2
# Groups: last4ccnum [55]
last4ccnum loyaltynum
<chr> <chr>
1 1286 L3572
2 1286 L3288
3 1310 L8012
4 1321 L4149
5 1415 L7783
6 1874 L4424
7 1877 L3014
8 2142 L9637
9 2276 L3317
10 2418 L9018
# … with 54 more rows
# A tibble: 64 x 3
# Groups: last4ccnum [55]
last4ccnum loyaltynum ccPerson
<chr> <chr> <int>
1 1286 L3572 1
2 1286 L3288 1
3 1310 L8012 2
4 1321 L4149 3
5 1415 L7783 4
6 1874 L4424 5
7 1877 L3014 6
8 2142 L9637 7
9 2276 L3317 8
10 2418 L9018 9
# … with 54 more rows
loyaltynum last4ccnum.x ccPerson last4ccnum.y
1 L3288 1286 1 9241
2 L3288 1286 1 1286
3 L3572 1286 1 1286
4 L8012 1310 2 1310
5 L4149 1321 3 1321
6 L7783 1415 4 1415
7 L4424 1874 5 1874
8 L3014 1877 6 1877
9 L9637 2142 7 2142
10 L3317 2276 8 2276
11 L9018 2418 9 2418
12 L6886 2463 10 2463
13 L5947 2540 11 2540
14 L1107 2681 12 2681
15 L2490 3484 13 3484
16 L7814 3492 14 3492
17 L7761 3506 15 3506
18 L9362 3547 16 3547
19 L1485 3853 17 3853
20 L2169 4434 18 4434
21 L8477 4530 19 4530
22 L2070 4795 20 4795
23 L2070 4795 20 8332
24 L8566 4795 20 8332
25 L8566 4795 20 4795
26 L3295 4948 21 5921
27 L3295 4948 21 4948
28 L9406 4948 21 5921
29 L9406 4948 21 8202
30 L9406 4948 21 4948
31 L2459 5010 22 5010
32 L2247 5368 23 7889
33 L2247 5368 23 5368
34 L6119 5368 23 5368
35 L6119 5368 23 7889
36 L4034 5407 24 5407
37 L3295 5921 25 5921
38 L3295 5921 25 4948
39 L9406 5921 25 5921
40 L9406 5921 25 8202
41 L9406 5921 25 4948
42 L6267 6691 26 6899
43 L6267 6691 26 6691
44 L8148 6816 27 6816
45 L3366 6895 28 6895
46 L6267 6899 29 6899
47 L6267 6899 29 6691
48 L9363 6901 30 6901
49 L6544 7108 31 7108
50 L6417 7117 32 7117
51 L1682 7253 33 7253
52 L6110 7354 34 7354
53 L6110 7354 34 8411
54 L9254 7354 34 7354
55 L3800 7384 35 7384
56 L4164 7688 36 7688
57 L5756 7792 37 7792
58 L5259 7819 38 7819
59 L2247 7889 39 7889
60 L2247 7889 39 5368
61 L6119 7889 39 5368
62 L6119 7889 39 7889
63 L8328 8129 40 8129
64 L5224 8156 41 8156
65 L2343 8202 42 8202
66 L9406 8202 42 5921
67 L9406 8202 42 8202
68 L9406 8202 42 4948
69 L2070 8332 43 4795
70 L2070 8332 43 8332
71 L8566 8332 43 8332
72 L8566 8332 43 4795
73 L6110 8411 44 7354
74 L6110 8411 44 8411
75 L2769 8642 45 8642
76 L5485 9152 46 9152
77 L4063 9220 47 9220
78 L3288 9241 48 9241
79 L3288 9241 48 1286
80 L3259 9405 49 9405
81 L5777 9551 50 9551
82 L5924 9614 51 9614
83 L5553 9617 52 9617
84 L3191 9635 53 9635
85 L7291 9683 54 9683
86 L9633 9735 55 9735
ccPerson last4ccnum.y
1 1 9241
2 1 1286
3 2 1310
4 3 1321
5 4 1415
6 5 1874
7 6 1877
8 7 2142
9 8 2276
10 9 2418
11 10 2463
12 11 2540
13 12 2681
14 13 3484
15 14 3492
16 15 3506
17 16 3547
18 17 3853
19 18 4434
20 19 4530
21 20 4795
22 20 8332
23 21 5921
24 21 4948
25 21 8202
26 22 5010
27 23 7889
28 23 5368
29 24 5407
30 26 6899
31 26 6691
32 27 6816
33 28 6895
34 30 6901
35 31 7108
36 32 7117
37 33 7253
38 34 7354
39 34 8411
40 35 7384
41 36 7688
42 37 7792
43 38 7819
44 40 8129
45 41 8156
46 45 8642
47 46 9152
48 47 9220
49 49 9405
50 50 9551
51 51 9614
52 52 9617
53 53 9635
54 54 9683
55 55 9735
loyaltynum last4ccnum.x ccPerson last4ccnum.y
1 L3288 1286 1 9241
2 L3288 1286 1 1286
3 L3572 1286 1 1286
4 L8012 1310 2 1310
5 L4149 1321 3 1321
6 L7783 1415 4 1415
7 L4424 1874 5 1874
8 L3014 1877 6 1877
9 L9637 2142 7 2142
10 L3317 2276 8 2276
11 L9018 2418 9 2418
12 L6886 2463 10 2463
13 L5947 2540 11 2540
14 L1107 2681 12 2681
15 L2490 3484 13 3484
16 L7814 3492 14 3492
17 L7761 3506 15 3506
18 L9362 3547 16 3547
19 L1485 3853 17 3853
20 L2169 4434 18 4434
21 L8477 4530 19 4530
22 L2070 4795 20 4795
23 L2070 4795 20 8332
24 L8566 4795 20 8332
25 L8566 4795 20 4795
26 L3295 4948 21 5921
27 L3295 4948 21 4948
28 L9406 4948 21 5921
29 L9406 4948 21 8202
30 L9406 4948 21 4948
31 L2459 5010 22 5010
32 L2247 5368 23 7889
33 L2247 5368 23 5368
34 L6119 5368 23 5368
35 L6119 5368 23 7889
36 L4034 5407 24 5407
37 L6267 6691 26 6899
38 L6267 6691 26 6691
39 L8148 6816 27 6816
40 L3366 6895 28 6895
41 L9363 6901 30 6901
42 L6544 7108 31 7108
43 L6417 7117 32 7117
44 L1682 7253 33 7253
45 L6110 7354 34 7354
46 L6110 7354 34 8411
47 L9254 7354 34 7354
48 L3800 7384 35 7384
49 L4164 7688 36 7688
50 L5756 7792 37 7792
51 L5259 7819 38 7819
52 L8328 8129 40 8129
53 L5224 8156 41 8156
54 L2769 8642 45 8642
55 L5485 9152 46 9152
56 L4063 9220 47 9220
57 L3259 9405 49 9405
58 L5777 9551 50 9551
59 L5924 9614 51 9614
60 L5553 9617 52 9617
61 L3191 9635 53 9635
62 L7291 9683 54 9683
63 L9633 9735 55 9735
# A tibble: 1,496 x 10
# Groups: last4ccnum [55]
ccPerson timestamp weekday date time location
<int> <dttm> <ord> <date> <chr> <chr>
1 1 2014-01-06 08:16:00 Mon 2014-01-06 08:16 Brew've Been…
2 1 2014-01-06 12:00:00 Mon 2014-01-06 12:00 Jack's Magic…
3 1 2014-01-06 13:27:00 Mon 2014-01-06 13:27 Abila Zacharo
4 1 2014-01-06 19:50:00 Mon 2014-01-06 19:50 Frydos Autos…
5 1 2014-01-07 07:54:00 Mon 2014-01-06 07:54 Brew've Been…
6 1 2014-01-07 12:00:00 Tue 2014-01-07 12:00 Jack's Magic…
7 1 2014-01-07 13:24:00 Tue 2014-01-07 13:24 Kalami Kafen…
8 1 2014-01-07 20:15:00 Tue 2014-01-07 20:15 Ouzeri Elian
9 1 2014-01-08 08:16:00 Wed 2014-01-08 08:16 Brew've Been…
10 1 2014-01-08 12:00:00 Wed 2014-01-08 12:00 Jack's Magic…
# … with 1,486 more rows, and 4 more variables: price <dbl>,
# last4ccnum <chr>, loyaltynum.x <chr>, hr <chr>
[1] 48
loyaltynum last4ccnum.x ccPerson last4ccnum.y
1 L1107 2681 12 2681
2 L1485 3853 17 3853
3 L1682 7253 33 7253
4 L2070 8332 43 8332
5 L2070 4795 20 4795
6 L2169 4434 18 4434
7 L2247 5368 23 5368
8 L2247 7889 39 7889
9 L2343 8202 42 8202
10 L2459 5010 22 5010
11 L2490 3484 13 3484
12 L2769 8642 45 8642
13 L3014 1877 6 1877
14 L3191 9635 53 9635
15 L3259 9405 49 9405
16 L3288 1286 1 1286
17 L3288 9241 48 9241
18 L3295 5921 25 5921
19 L3295 4948 21 4948
20 L3317 2276 8 2276
21 L3366 6895 28 6895
22 L3572 1286 1 1286
23 L3800 7384 35 7384
24 L4034 5407 24 5407
25 L4063 9220 47 9220
26 L4149 1321 3 1321
27 L4164 7688 36 7688
28 L4424 1874 5 1874
29 L5224 8156 41 8156
30 L5259 7819 38 7819
31 L5485 9152 46 9152
32 L5553 9617 52 9617
33 L5756 7792 37 7792
34 L5777 9551 50 9551
35 L5924 9614 51 9614
36 L5947 2540 11 2540
37 L6110 8411 44 8411
38 L6110 7354 34 7354
39 L6119 7889 39 7889
40 L6119 5368 23 5368
41 L6267 6691 26 6691
42 L6267 6899 29 6899
43 L6417 7117 32 7117
44 L6544 7108 31 7108
45 L6886 2463 10 2463
46 L7291 9683 54 9683
47 L7761 3506 15 3506
48 L7783 1415 4 1415
49 L7814 3492 14 3492
50 L8012 1310 2 1310
51 L8148 6816 27 6816
52 L8328 8129 40 8129
53 L8477 4530 19 4530
54 L8566 4795 20 4795
55 L8566 8332 43 8332
56 L9018 2418 9 2418
57 L9254 7354 34 7354
58 L9362 3547 16 3547
59 L9363 6901 30 6901
60 L9406 8202 42 8202
61 L9406 4948 21 4948
62 L9406 5921 25 5921
63 L9633 9735 55 9735
64 L9637 2142 7 2142
gps <- read_csv("MC2/gps.csv") %>%
mutate(date = as.Date(mdy_hms(Timestamp)), time = format(mdy_hms(Timestamp), "%H:%M"))
gps$Timestamp <- date_time_parse(gps$Timestamp, zone = "", format = "%m/%d/%Y %H:%M:%S")
gps$hr <- strftime(gps$Timestamp, "%H")
gps$id <- as_factor(gps$id)
gps$weekday = wday(gps$date, label = TRUE, abbr = TRUE)
gps
# A tibble: 685,169 x 8
Timestamp id lat long date time hr
<dttm> <fct> <dbl> <dbl> <date> <chr> <chr>
1 2014-01-06 06:28:01 35 36.1 24.9 2014-01-06 06:28 06
2 2014-01-06 06:28:01 35 36.1 24.9 2014-01-06 06:28 06
3 2014-01-06 06:28:03 35 36.1 24.9 2014-01-06 06:28 06
4 2014-01-06 06:28:05 35 36.1 24.9 2014-01-06 06:28 06
5 2014-01-06 06:28:06 35 36.1 24.9 2014-01-06 06:28 06
6 2014-01-06 06:28:07 35 36.1 24.9 2014-01-06 06:28 06
7 2014-01-06 06:28:09 35 36.1 24.9 2014-01-06 06:28 06
8 2014-01-06 06:28:10 35 36.1 24.9 2014-01-06 06:28 06
9 2014-01-06 06:28:11 35 36.1 24.9 2014-01-06 06:28 06
10 2014-01-06 06:28:12 35 36.1 24.9 2014-01-06 06:28 06
# … with 685,159 more rows, and 1 more variable: weekday <ord>
I am eliminating coordinates that indicating that the car is moving The GPS car coordinates are recorded every 1-5 secs. Therefore, if there is a GPS record difference of more than 5 min, which means the employee has driven the car to a destination. Thus this eliminates possible traffic light stops and car moving in motion data.
For each employee: 1. I am getting the first and last car coordinate each day 2. Getting places of interest through the day
ts <- gps %>%
group_by(id) %>%
arrange(date, time, by_group=TRUE) %>%
mutate(diff = round(c(difftime(tail(Timestamp, -1), head(Timestamp, -1), units = "mins"), 0)), 2) %>%
mutate(count = 1:n(), FIRST = count == 1, LAST = count == max(count)) %>%
filter(diff > 5 | FIRST == TRUE | LAST == TRUE) %>%
arrange(id) %>%
select(id, lat, long, date, time, diff, hr, weekday)
ts
# A tibble: 3,133 x 8
# Groups: id [40]
id lat long date time diff hr weekday
<fct> <dbl> <dbl> <date> <chr> <drtn> <chr> <ord>
1 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
2 1 36.1 24.9 2014-01-06 07:22 35 mins 07 Mon
3 1 36.0 24.9 2014-01-06 08:04 253 mins 08 Mon
4 1 36.1 24.9 2014-01-06 12:26 59 mins 12 Mon
5 1 36.0 24.9 2014-01-06 13:34 250 mins 13 Mon
6 1 36.1 24.9 2014-01-06 17:48 108 mins 17 Mon
7 1 36.1 24.9 2014-01-06 19:42 7 mins 19 Mon
8 1 36.1 24.9 2014-01-06 19:49 38 mins 19 Mon
9 1 36.1 24.9 2014-01-06 20:33 98 mins 20 Mon
10 1 36.0 24.9 2014-01-06 22:15 46 mins 22 Mon
# … with 3,123 more rows
# A tibble: 3,777 x 15
# Groups: id [37]
id lat long date time.x diff hr weekday.x
<fct> <dbl> <dbl> <date> <chr> <drtn> <chr> <ord>
1 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
2 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
3 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
4 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
5 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
6 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
7 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
8 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
9 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
10 1 36.1 24.9 2014-01-06 07:20 0 mins 07 Mon
# … with 3,767 more rows, and 7 more variables: timestamp <dttm>,
# time.y <chr>, location <chr>, price <dbl>, last4ccnum <chr>,
# loyaltynum <chr>, weekday.y <ord>
class : RasterLayer
band : 1 (of 3 bands)
dimensions : 1618, 2716, 4394488 (nrow, ncol, ncell)
resolution : 3.148491e-05, 3.148491e-05 (x, y)
extent : 24.82429, 24.9098, 36.04476, 36.0957 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : /Users/yuntinghong/Documents/SMU/ISSS608 - Visual Analytics/hongyunting/YTBlog_ISSS608/_posts/2021-07-04-yt-assignment-mc2/MC2/Geospatial/MC2-tourist_modified.tif
names : MC2.tourist_modified
values : 0, 255 (min, max)

OGR data source with driver: ESRI Shapefile
Source: "/Users/yuntinghong/Documents/SMU/ISSS608 - Visual Analytics/hongyunting/YTBlog_ISSS608/_posts/2021-07-04-yt-assignment-mc2/MC2/Geospatial", layer: "Abila"
with 3290 features
It has 9 fields
Integer64 fields read as strings: TLID
class : SpatialLinesDataFrame
features : 3290
extent : 24.82401, 24.90997, 36.04501, 36.09492 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
variables : 9
names : TLID, FEDIRP, FENAME, FETYPE, FEDIRS, FRADDL, TOADDL, FRADDR, TOADDR
min values : 184619449, N, Acera, Ave, N, 1, 20, 1, 17
max values : 184714869, S, Zefirou, Way, N, 7400, 7498, 7401, 7499
gps_sf_path <- st_as_sf(ts, coords = c("long", "lat"), crs = 4326) %>%
#group_by(id) %>%
#summarize(Timestamp = mean(Timestamp),
# do_union=FALSE) %>% #Summarize is unless, but because we want to use group_by, we need to do something
st_cast("POINT")
# Map view of Abila, Kronos's route and Employees' whereabout
tmap_mode("view")
tm_shape(bgmap) +
tm_rgb(bgmap, r = 1, g = 2, b = 3, # setting red to band 1, green to band 2, blue to band 3
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tmap_options(max.categories = 44) +
tm_shape(Abila_st) +
tm_lines() +
tm_shape(gps_sf_path) +
tm_dots(col ="id",
popup.vars=c("Date:"="date", "Time:"="time", "Week:"="weekday", "Stopover duration (mins):"="diff"))




As shown in the code chunk above, “Katherina’s Cafe” is identified as the popular location as it has the highest number of transaction made within these 2 weeks, followed by “Hippokampos” and “Guy’s Gyros”. However, “Abila Zacharo” attracts more than 80% of GAStech employees who drove. During Monday to Friday (Working day), the most frequented locations are Food & Beverage where employees go out for lunch or buy a cup of coffee. There
```{.r .distill-force-highlighting-css}